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Infectious diseases outbreaks are often characterized by a spatial component induced by hosts' distribution, 
mobility, and interactions. Spatial models that incorporate hosts' movements are being used to describe 
these processes, to investigate the conditions for propagation, and to predict the spatial spread. Several 
assumptions are being considered to model hosts' movements, ranging from permanent movements to daily 
commuting, where the time spent at destination is either infinite or assumes a homogeneous fixed value, 
respectively. Prompted by empirical evidence, here we introduce a general metapopulation approach to 
model the disease dynamics in a spatially structured population where the mobility process is characterized 
by a heterogeneous length of stay. We show that large fluctuations of the length of stay, as observed in reality, 
can have a significant impact on the threshold conditions for the global epidemic invasion, thus altering 
model predictions based on simple assumptions, and displaying important public health implications. 

Space represents, in many circumstances, a relevant feature in the spread of an infectious disease 1 . This is the 
case of recent outbreaks that reached the global scale, such as e.g. the SARS outbreak in 2002-2003 2 and the 
2009 H1N1 pandemic influenza 3,4,5 , but also applies to country or region-specific epidemics 6 or even 
outbreaks at smaller scales 7,8 . The diffusion of a directly transmitted disease in a given environment depends 
on the spatial distribution of susceptible hosts and their ability to move from one region to the other of the space, 
connecting otherwise isolated communities. 

The recent availability of high-resolution large datasets of hosts' population spatial distribution and mobility 
have enabled the development of spatially based modelling approaches that, by integrating the above knowledge 
with the disease description, are able to provide a theoretical and quantitative framework to describe the disease 
behaviour in time and space 1,9-14 . Hosts' movements represent a key ingredient of such approaches, and different 
modelling approximations are made depending on the type of mobility process, or based on simplification 
considerations, or due to the constraints imposed by limited available knowledge. Movements may be permanent, 
as in the case of human migrations 15 or livestock trade displacements 16,17 , or may be characterized by origin- 
destination patterns with subsequent return to the origin, as in the case of human travel in general. The latter case 
requires the explicit introduction of an element of memory in the process, as well as the specification of the 
duration of the hosts' visit at destination. Short trips within Europe during holiday period in spring-summer 2009, 
for example, contributed to the diffusion of the 2009 H1N1 pandemic in the continent, following the first 
European outbreaks seeded by Mexico and the US 18 . Different travel habits and trip durations, induced by 
geographical, cultural and seasonal conditions, may have different outcomes. In particular, the timescale of 
the length of stay at destination is an important aspect of the spatial spreading process, as it represents the time 
window during which the disease can be brought to the unaffected population at destination by an infectious 
traveller visiting that given destination, or during which the healthy traveller may contract the disease from the 
outbreak taking place at destination while visiting (as e.g. in the H1N1 case above). Given its relevance for 
spreading processes, and the large variability of individual travel behaviour affecting the reasons for travel and 
associated frequency and durations of trips, here we consider a modelling approach that explicitly includes this 
timescale and its fluctuations, aiming at the understanding of the inclusion of additional ingredients in ever more 
realistic data-driven models. Several degrees of heterogeneities are indeed found to characterize many aspects 
relevant to epidemic spreading - from the individual level 19 22 to the population level 1016,23,24 - and to have a 
profound impact on epidemic phenomena occurring on such complex substrates 22,25 31 . Here, through a meta- 
population theoretical framework and detailed simulations, we address the inclusion of an extra degree of 
complexity, namely the heterogeneous length of stay of hosts at destination, and discuss its impact with respect 
to simpler modelling assumptions. 
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Results 

Metapopulation model with heterogeneous length of stay. We 

consider a population of hosts that is spatially structured into V 
subpopulations coupled by hosts' movements, representing a 
metapopulation network where nodes correspond to subpopulations 
where the infection dynamics takes place, and links correspond to the 
mobility between origin and destination subpopulations. Analyses of 
human mobility data and transportation infrastructures have shown 
the presence of various levels of heterogeneities characterizing the 
movement patterns 83135 . The structure of these patterns often 
displays large variability in the number of connections per mobility 
centre (whose corresponding geographic census area represents the 
subpopulation in the metapopulation framework), and broad 
fluctuations are also observed in the traffic exchanged by each 
mobility centre or flowing on a given connection between origin 
and destination. In Figure 1 we report the example of the European 
air transportation network 36 , showing the probability distributions of 
the number of connections k per airport (i.e. the subpopulations 
degree, panel a) and of the total traffic of passengers Wy travelling 
between any pair of linked airports i and j (Figure lb). In addition, 
general statistical laws can be found that relate the traffic flow between 
any two geographic areas and their properties, such as population 
sizes, number of connections or other features 9,10,2334,37 , depending 
on the specific system under study. In the case of the air tran- 
sportation network, the behaviour of the average weight w kk > 
flowing along the connection between two airports with degree k 
and k' is found to be a function of the product of the degrees 26 , i.e. 
(wjyt ) <x (fcfc ) . Empirical data on the population sizes of the 
geographical census areas associated to mobility patterns also show 
scaling relations in terms of the subpopulation degree k, with N k cck^ 
where N k is the population size of the census area with k air-travel 
connections 9 . These features have been observed at various resolution 



scales and for different types of mobility modes 9,2637,3839 , including 
commuting processes 38 and within-cities daily movements 2739 . 

In order to capture these empirical properties, we consider a meta- 
population model characterized by a random connectivity pattern 
described by an arbitrary degree distribution P(k), and in the follow- 
ing we will explore the role of realistic heterogeneous network struc- 
tures and also homogeneous network structures for comparison. To 
take into account the large degree fluctuations empirically observed, 
we describe the system by adopting a degree-block notation 25 that 
assumes statistical equivalence for subpopulations of equal degree 
A: 27 . This corresponds to assuming that all subpopulations having the 
same number of connections are considered statistically identical. In 
other words, subpopulations with degree k are characterized by the 
same population size N k and by the same mobility properties, includ- 
ing the length of stay and the traffic of individuals, as illustrated 
below. While disregarding more specific properties of each indi- 
vidual subpopulation - that may be related for instance to local, 
geographical or cultural aspects - this mean field approximation is 
able to capture the degree dependence of many system's properties as 
observed in the data, and also allows for an analytical treatment of the 
system's behavior 27 . 

Following the scaling properties observed in real-world mobility 
data, we define: (i) the population size N k of a subpopulation with 
degree k as N k = Nk' l '/(k' ! '), with N = X k N k P(k) being the average 
subpopulation size of the metapopulation system; (ii) the number 
of individuals moving from the subpopulation of degree k to the 
subpopulation of degree k' as w kk = Wo (kk ) . The exponents (j> 
and 0, and the scaling factor w 0 assume different values depending 
on the application to the mobility process under consideration. The 
mobility of hosts along the links of the metapopulation network is 
modelled with the per capita diffusion rate a kk = w kk / N k where we 
assume that individuals are randomly chosen in the population 
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Figure 1 | Heterogeneous topology, traffic, and length of stay of the mobility network by air travel in Europe. Statistical fluctuations are observed over a 
broad range of length and time scales, (a) The degree distribution P(k) of the European airport network is characterized by large fluctuations indicating a 
heterogeneous topology of the system, (b) The distribution of weights in the European airport network is skewed and heavy-tailed, (c) The number of 
nights spent by foreign travellers visiting the UK (in blue) and by British travelling abroad (in red) spans several orders of magnitude, from 1 night to 
several months, considering all travel purposes and all countries of origin and destination, (d) The number of nights spent by tourists travelling to 
European countries on holiday is similarly broadly distributed. Here we show data for 5 selected countries of destination, considering only trips of 4 nights 
or more (source: Eurostat). 
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according to the patterns w kk . The overall leaving rate out of the 
subpopulation with degree k is given by a k = k ~^2 k a kk P(k \kj where 
P(k \kj is the conditional probability that a node with degree k is 
connected to a node with degree k , and we define the diffusion rate 
rescaling as a = w 0 (k^) /N. Finally, individuals return to their origin 
after an average time z k that corresponds to the length of stay at 
destination with degree k , or the duration of their visit. 

The economic literature considers the length of stay as one of the 
key elements that need to be solved in modelling a visitor's decision- 
making process 38 . Its determinants, however, are still largely debated 
and no clear consensus has emerged on the problem 39 . Empirical data 
on the length of stay at the individual level is rather scant, but many 
statistical sources clearly indicate a vast heterogeneity characterizing 
this quantity. Figure lc reports the probability distribution of 
the length of stay of people travelling to and from the United 
Kingdom for all purposes, showing large fluctuations ranging from 
1 day or less, up to several months 40 . Similar broad distributions are 
observed in the travel patterns of visitors that spend their holidays in 
Europe (Figure Id), and from other sources of travel statistics as 
reported in the Supplementary Information (SI). Next to air travel, 
heterogeneous duration of visits are also observed in human traject- 
ories characterizing daily activities 8,32,34 , ranging from few minutes to 
several hours. To incorporate these fluctuations in the metapopula- 
tion modelling framework, we assume that the length of stay at 
destination x k is a scaling function of the degree k of the subpopula- 
tion of destination, i.e. 



where z/{k x ) is the normalization factor, and z = ^2 k z k P(k) is the 
average length of stay on the metapopulation network. By tuning the 
value of the power-law exponent x, we can define different regimes of 
the mobility dynamics. For X>0 the length of stay is positively cor- 
related with the degree of the subpopulation of destination, meaning 
that individuals travelling to a well-connected location will spend a 
longer time at destination, thus being longer exposed to the local 
population, with respect to individuals travelling to peripheral loca- 
tions. This can be motivated by the attractiveness of popular loca- 
tions, for which the pattern of connection is optimized through large 
degrees to manage large fluxes volumes of individuals, both at the 
within-city scale and at the larger geographical scale where airport 
hubs handle large traffic due to tourism or seasonal/temporary job 
opportunities 41,42 . The opposite regime, obtained for y<0, implies 
that the time spent at a location is larger for decreasing degree of the 
subpopulation of destination, and may correspond to an individual 
choice of optimization between the time spent to reach the destina- 
tion and the time spent at destination. Low degree locations are 
indeed generally peripheral in the system, so that a trip from a given 
origin may take multiple steps to reach the final destination, and thus 
a longer length of stay at the hard-to-reach destination may then 
adequately balance and justify the travel time 43 . The value % = 0 
corresponds to the case of homogeneous length of stay as it is gen- 
erally assumed in the recurrent mobility process of commuting 
where the time z represents the duration of an average working 
day 10,16,30 , whereas the condition X-^- 00 corresponds to the case of 
permanent migration that is used to model mobility processes with 
no return to the origin (such as the case of livestock displacements in 
trade flows 16,17 ) or to approximate origin-destination mobility by 
simplifying the modelling approach and assuming a Markovian pro- 
cess 27,44 . Here, instead, we are interested in understanding whether if 
and to what extent the empirically observed heterogeneity of the 
length of stay may affect the invasion dynamics of the disease and 
its geotemporal spread, by assuming that this heterogeneity is fully 
encoded in the connectivity properties of the subpopulation of des- 
tination. The assumption on the geographical dependence of the 
length of stay, determined by the degree k of the location, finds its 



support in travel statistics available at the city level that aggregate 
various travel modes and reasons (see the SI). Empirical evidence 
from higher detail datasets on human movements that would allow 
us to identify a specific functional form for Eq. (1) is currently lack- 
ing, therefore we adopt a rather simple power-law behaviour, con- 
sistently with the other scaling properties of human mobility, 
enabling the analytical study of the epidemic invasion process in 
the system under variations of the assumed parameters while pre- 
serving the heterogeneity of z. More sophisticated assumptions can 
be made on the expression of z, that may depend, for instance, both 
on the origin and destination subpopulations, or on the individual 
behaviour, and will be the object of future studies. 

In order to take into account the memory effects associated with 
the mobility process (i.e. the return to the location of origin), we 
subdivide the population of individuals N k resident in the subpopu- 
lation of degree k into those individuals who are from k and are 
located in k at time f, N kk (t), and those who are from k and are located 
in a neighbouring subpopulation having degree k ' at time t , N kk ( t ) . A 
schematic representation of the system, illustrating the subdivision of 
the populations for two connected locations, is shown in Figure 2, 
and the definition of the degree-block variables is reported in Table 1 . 
This formalism 10,30,45,46 allows keeping track of the origin subpopula- 
tion, and the mobility process is mathematically represented by rate 
equations describing the time behaviour of N kk (t) in terms of the 
diffusion and return rates, as illustrated in the Methods section. The 
overall dynamics is characterized by the interplay between the time- 
scales of the mobility process and the intrinsic timescale of the infec- 
tious disease. Realistic values of the mobility rates fall in the regime 
a k <Hz k l , as the fraction of travellers of a population on the typical 
timescale of the movement is very small, as for instance observed 
from air travel 9 and commuting data 10 . By focusing on infectious 
diseases characterized by relatively long timescales, a quasi-station- 
ary approximation can be considered that assumes the mobility pro- 
cess to be at equilibrium with respect to the epidemic process 10,30,46 , 
with the subpopulation sizes assuming their stationary values: 




Figure 2 | Schematic representation of the non-Markovian travelling 
dynamics. At any time the subpopulation with degree k is occupied by a 
fraction of its own population N kk (individuals resident in the 
subpopulation with degree k and currently located there) and a fraction of 
individuals N k ' k whose origin is the neighbouring subpopulation with 
degree k' and who are currently visiting the degree k subpopulation. 
Travelling individuals from the subpopulation with degree k (light blue 
particles) leave their home subpopulation to the subpopulation with 
degree k' with rate a kk and return back with rate t^ 1 , where z kk is the 
average time spent at destination. Here we assume that the length of stay is 
function of the destination only, namely z kk = z k . This exchange of 
individuals between subpopulations is the origin of the transmission of the 
contagion process among the subpopulations. 
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Table 1 | Definition of the degree-block variables 

Defin ition 



Degree-block 
variable 



k Degree, i.e. the number of connections of a subpopulation 

Vk Number of subpopulations with degree k 

N k Individuals resident in subpopulations of degree k 

N kk Individuals resident in subpopulations of degree 

k located in subpopulation of degree k' 
x k Length of stay of individuals at subpopulations with degree k 

w kk Number of individuals leaving a subpopulation 

of degree k to a subpopulation of degree k' 
<j kk Leaving rate of travellers from subpopulations 

of degree k to subpopulations of degree k' 
o k Total leaving rate out from subpopulations of degree k 

T kk ' Total volume of people travelling on a connection between 

subpopulations of degree k and subpopulations of degree k' 



N kk - = 



0N1 

WW) 



v k k e k B+ * 



with the factor given by v/t = I 1 



-OT 



S+x+l\ 



> fc 9-#+l 



(km 

(see the Methods section for the full derivation). 

While mobility connects different locations, the epidemic process 
is modelled within each subpopulation, where we assume a standard 
SIR model that partitions the total number of individuals into sus- 
ceptible, infectious, and recovered individuals 47 . 

The SIR infection dynamics is ruled by the transition rates, fi and 
fi, representing the disease transmissibility rate (for the transition 
from susceptible to infectious) and the recovery rate (for the trans- 
ition from infectious to recovered), respectively. The epidemic model 
is characterized by the reproductive number 47 R 0 = p/fi, that defines 
the average number of infected individuals generated by one infec- 
tious individual in a fully susceptible population, thus leading to the 
threshold condition -R 0 >1 for an epidemic in a single population 47 . 

The quasi-stationary approximation adopted to describe the sub- 
population sizes in terms of Eqs. (2) and (3) is ensured by considering 
an infectious period longer than the typical length of stay at destina- 
tion, i.e. \i ~ 1 S> I* . This is well supported in the case of many relevant 
examples, such as influenza or SARS with of the order of days, 
when the metapopulation framework is applied to the geotemporal 
scale of human daily activities, characterized by broad distributions 
of time spent at each activity location 83234 however limited by a cutoff 
of approximately 17 hours capturing the typical awake interval of an 
individual 8 . In the following we will solve the system in the relatively 
long infectious period approximation: we will characterize the 
dependence of the invasion behaviour on the length of stay para- 
meters x an d we will show through numerical simulations that the 
behaviour is still valid when the approximation does not hold any- 
more, making this approach applicable to a variety of geotemporal 
scales regarding infectious diseases and types of mobility, ranging 
from hourly activities to daily and monthly trips. 

Threshold condition for global invasion. While R 0 provides a 
threshold condition for the occurrence of the outbreak, the 
epidemic spread at the global level is clearly dependent on the 
spatial structure of the population. An outbreak may indeed start in 
a seeded subpopulation, but it may not be able to spread to the 
neighbouring subpopulations due to specific conditions related to 
the mobility process. For example, the diffusion rate may be small 
enough as to prevent the travel of an infectious individual before the 
epidemic ends in the seeded subpopulation; or, the time spent at 
destination by the travelling infectious individual is too short for 
her/him to transmit the disease to individuals of the local 



population before returning to the subpopulation of origin. The 
occurrence of these events has a clear stochastic nature and is 
captured by the definition of an additional predictor of the disease 
dynamics, R*, regulating the number of subpopulations that become 
infected from a single initially infected subpopulation 47 50 , analogously 
to the reproductive number Ro at the individual level. In the following, 
we show that it is possible to provide an analytical expression for this 
threshold parameter assuming non-homogeneous origin-destination 
patterns of mobility captured by Eq. (1), leading to non- trivial effects 
in the spreading dynamics, and going beyond the cases of permanent 
migration 2833 and homogeneous mobility processes 30 . 

Let us consider the invasion process of the epidemic spread at the 
metapopulation level, by using the subpopulations as our elements of 
the description of the system, in a Levins-type modelling approach. 
We assume that the outbreak starts in a single initially infected 
subpopulation of a given degree k and describe the spread from 
one subpopulation to the neighbour subpopulations through a 
branching process approximation 51 . We denote by the number 
of infected subpopulations of degree k at generation n, with D° k being 
the initially seeded subpopulation, D l k the subpopulations of degree 
k! of generation 1 directly infected by D° k through the mobility pro- 
cess, and so on. By iterating the seeding events, it is possible to 
describe the time behaviour of the number DJ! of infected subpopu- 
lations as follows: 



Dj=z tf i#-i(fc'-i)P(*|k'; 



\-R 0 » 



1-S 



v k 



(4) 



The r.h.s. of equation (4) describes the contribution of the subpopu- 
lations D^ 1 of degree k' at generation n — 1 to the infection of 
subpopulations of degree k at generation n. Each of the DS~ 1 sub- 
populations has k' — l possible connections along which the infection 
can spread. The infection from D?~ 1 to D k occurs if: (i) the connec- 
tions departing from nodes with degree k' point to subpopulations 
with degree k, as ensured by the conditional probability P(k\ k ) ; (ii) 
the reached subpopulations are not yet infected, as indicated by the 

probability I 1 — ZJJ," g — — J , where is the number of subpopula- 



tions with degree k; (iii) the outbreak seeded by A^ k infectious indi- 
viduals travelling from subpopulation k' to subpopulation k takes 

place, and the probability of such event is given by ^1 — R 0 /l "^ 52 . 

The latter term is the one that relates the dynamics of the local 
infection at the individual level to the coarse-grained view that 
describes the disease invasion at the metapopulation level. Given 
the subpopulation of degree k' where an outbreak is taking place, 
the spread of the infection to a neighbouring disease-free subpopula- 
tion of degree k can occur due to the travel of infected individuals 
resident in k' who interact with the population of k during their visit, 
or to the infected individuals resident in k who come back to their 
subpopulation of origin after a trip to the subpopulation k! . If we 
denote by a the attack rate of the SIR epidemic, i.e. the total fraction 
of the population that is infected by the epidemic, then we can 
express the number of seeds as 30 X^^ = a.{N^ +Nfc-&) where we 
assume the stationary expressions for the populations N kk - and 
N k - k given by Eqs. (2) and (3). By plugging this expression into 
equation (4), and assuming an epidemic unfolding on an uncorre- 
cted metapopulation network with a reproductive number close to 
the threshold value, R 0 ~ 1, it is possible to analytically solve equation 
(4) for the early stage of the epidemic process, and obtain the fol- 
lowing expression for the global invasion threshold: 



2(flo-l) 



- A({P(k)},a,r,N) 



-oNx 



(WW) 



(5) 



Where 
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A({P(k)},a,t,N) = ((k— l)k 2B+ * +1 v k ) + 



^((fc-l)fc 2 ( fl +*) + 1 )((/c-l)fc M + 1 vj>) 



(6) 



is a functional form of the degree distribution and of its moments, of 
the average length of stay t, of the leaving rate rescaling a, and of the 
average subpopulation size N (see the Methods section for the full 
derivation). 

Equation (5) defines the threshold condition for the global inva- 
sion: if R, assumes values larger than 1, the epidemic starting from a 
given subpopulation will reach global proportion affecting a finite 
fraction of the subpopulations; if instead K*<1, the epidemic will be 
contained at its source without invading the metapopulation system. 
While assuming a simple SIR model with homogeneous mixing for 
the disease dynamics, it is important to note that stochasticity and 
discreteness of the mobility events are fundamental aspects to be 
considered in order to obtain the expression for R*. A continuum 
approximation would indeed lead to invasion under all conditions of 
mobility and length of stay, after the initial outbreak in the seed, due 
to the movements of anyhow small fractions o^l^ of infectious 
individuals 485053 . 

Impact of length of stay. The global invasion threshold _R„ is a 
combination of several ingredients of the system, including disease 
parameters, demography, metapopulation network structure, travel 
fluxes and mobility timescales. Figure 3 shows the role played by 
the topological heterogeneity of the metapopulation network and 
the topologically coupled heterogeneous length of stay, by reporting 
the condition R,>1 in the comparison between different network 
structures (panel a). A heterogeneous metapopulation network 
structure with size V=10 4 and characterized by a power-law degree 
distribution P(k)cck~ r with }>=3 and average degree (fc)=3 is 
compared to a homogeneous random topology of the same size, 
where every node is assumed to have a degree equal to (k). When 
no heterogeneity in the length of stay is considered, i.e. for y = 0, 
heterogeneous structures dramatically favour the epidemic invasion, 
as indicated by the threshold R, = l occurring at smaller va- 
lues of the reproductive number with respect to the homogeneous 



one. This means that, in this regime, there is an interval in the 
-R 0 >1 region for which the disease transmission potential is small 
enough so that the epidemic can be contained in a homogeneous 
system, but large enough to reach global proportion on a 
population that is heterogeneously spatially structured. This finding 
confirms previous results on the role of large degree fluctuations in 
reducing the threshold value for the disease invasion in a 
metapopulation system with Markovian 28,33 or recurrent 
homogeneous mobility 30 . When y¥=0, this effect is complicated by 
the inclusion of an additional layer of complexity. By increasing the 
value of x> R* considerably increases its value, leading to a 
corresponding reduction of the critical value of the reproductive 
number above which the global invasion occurs, thus further 
favouring the epidemic spread even for very small Rg by means of 
an additional mechanism. This is obtained by keeping fixed the 
average value of the length of stay x. Travelling hosts spend a longer 
time visiting largely connected subpopulations than peripheral ones, 
further enhancing the spreading potential of the hubs 27 and thus 
making the disease propagation on the meta- 
population system increasingly more efficient. On the contrary, 
negative and decreasing values of y tend to suppress the critical 
spreading potential of the hubs, by balancing their large degree with 
increasingly smaller lengths of stay. The degree heterogeneity is 
counterbalanced by the fluctuations of the length of stay expressed 
by its negative correlations with the subpopulation degree; the result is 
the increase of the critical threshold in the value of -Ro that 
distinguishes containment from disease invasion. This counter- 
balancing effect is not observed in the case of a homogeneous 
network, due to the absence of degree fluctuations. In the regime 
X<0, the spreading potential of the heterogeneous metapopulation 
network induced by the degree fluctuations is increasingly lowered by 
the chosen approximation for the length of stay, until it effectively 
reduces to the homogenous case, as displayed in Figure 3. In some 
circumstances, for given values of the parameters, the spreading 
potential in the heterogeneous case becomes smaller than in the 
corresponding homogeneous one, due to the suppressing role of 
hubs characterized by very small length of stay, as reported in the 
SI. 



Rn 13 



Heterogeneous network 
Homogeneous network 




R* 10 




Figure 3 | Threshold condition for global invasion: analytical results, (a) Invasion regions in the plane (Ro, y) corresponding to the condition R„> 1 of 
Eq. (5) for a heterogeneous metapopulation network (blue) with size V= 10" characterized by a power-law degree distribution P(k) cck~ y with y=3 and 
average degree (k) = 3, and a homogeneous network (red) with same size and subpopulations degrees uniformly equal to (k). Other parameters are set 
equal in the two cases: diffusion rate a= 10 5 and scaling exponents 8= 1/2 and 0 = 3/4 as in the worldwide airport network*' 26 , average subpopulation size 
N= 10 3 , and assumed average length of stay t = 37 to justify the timescale separation assumption and also allow the comparison with the numerical 
results. Since these parameters depend on the specific application under study, we report in the SI an exploration of additional parameter values, (b) 
Analytical surface of the global invasion threshold, R „ = (Ro,y), as a function of the reproductive number Rg and of the parameter y tuning the 
heterogeneity of the length of stay. The surface is calculated in the case of the heterogeneous network considered in panel a. 
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Figure 4 | Threshold condition for global invasion: simulation results. 

(a) Average final fraction of infected subpopulations D„/V(symbols) and 
standard deviations (error bars) obtained from the simulations of an SIR 
epidemic with _Ro= 1.2 as a function of the exponent x tuning the length of 
stay, for different values of the diffusion rate rescaling a. Coloured arrows 
indicate the analytical value of the corresponding transition from no 
invasion to the global invasion regime. Error bars correspond to the 
standard deviation obtained from 500 stochastic simulations, (b) Average 
final fraction of infected subpopulations D-JV (symbols) and standard 
deviations (error bars) as a function the basic reproductive number Rq, for 
different values of the length of stay parameter y, and for diffusion rate 
rescaling <r=10~ 5 . Coloured arrows as above. (c,d) Comparison of the 
simulated average fraction of infected subpopulations £W Vas a function 
of x and a for a heterogeneous network (c) and a homogeneous network 
with same size and average degree (d). Here _Ro=1.2. In all panels the 
parameter values are kept equal to the ones used in Figure 3 to allow for the 
direct comparison between analytic and numerical results. In addition, the 
infectious rate is /(=0.002 (in arbitrary time units) to ensure the timescale 
approximation. 



In the interval [—1,1] of y under consideration, the absolute vari- 
ation on the critical value of -R 0 corresponding to the invasion thresh- 
old condition is approximately equal to 30%, and its relative variation 
on the outbreak condition -R 0 >1 is of more than 90% (Figure 3a), 
thus showing the importance of the fluctuations in the length of stay 
in the outbreak management. Smaller or larger variations can be 
obtained as they depend on the specific metapopulation system 
under study, the type of mobility, the structure of the network and 
the geotemporal resolution scale considered. For instance, ground 
workflows are found to be one order of magnitude larger than air 
travel flows 10 , thus affecting the value of a in Eqs. (5) and (6), whereas 
different levels of degree heterogeneity can be found depending on 
the type of mobility network or the region under study. In the 
Supplementary Information we provide additional examples of the 
invasion region R„>1 by varying the model parameters that are 
application-specific. 

Next to the invasion threshold condition R,= 1 it is also important 
to investigate the absolute value of the predictor R* above the thresh- 
old, as its distance from the critical value, i.e. R f — 1, is a quantitative 
measure of the public health efforts that need to be put in place for the 
containment of the disease. Figure 3b shows the 3D surface of the 
global invasion threshold R, as a function of the reproductive num- 
ber _R 0 and of the parameter % tuning the heterogeneity of t. The 
values of _R„ rapidly increase in the invasion region, reaching very 
high values even for set of (i? 0 , yj values close to the critical ones. 
While maintaining fixed the mobility network structure and other 
system features (such as e.g. the population size of each subpopula- 
tion), it is possible to envision control strategies in terms of reduc- 
tions in the leaving rate occurring during the outbreak, thus encoded 
in reductions of the leaving rate rescaling a in Eq. (5). These would 
correspond to the application of travel-related intervention measures 
but also to the effects of self-reaction of the population who avoid 
travelling to the affected area, as observed during the early stage of 
the SARS outbreak and of the 2009 H1N1 pandemic 54 . The mag- 
nitude of such reduction would likely need to be, however, extremely 
large to bring the value of _R* below the threshold in a range of 
realistic parameter values, as observed from the 3D surface, thus 
explaining the failure of feasible travel restrictions aimed at the con- 
tainment of the disease 54 . 

Synthetic networks and numerical simulations. In order to test the 
validity of our analytical treatment, we performed extensive Monte 
Carlo numerical simulations at the individual level, by keeping track 
of the disease dynamics, of the movement and of its associated 
memory for each host in the system. Simulations are based on 
discrete-time stochastic processes, as detailed in the Methods 
section. We explored the phase space in order to make a direct 
comparison with the analytical findings discussed above, and con- 
sidered heterogeneous and homogeneous metapopulation networks 
having the same size and average degree. More in detail, consistently 
with the uncorrelated approximation used in the analytical 
framework developed, we consider heterogeneous uncorrelated ran- 
dom networks generated through the uncorrelated configuration 
model 55 that allows the creation of a network structure charac- 
terized by a given degree distribution P(k) and having no topolo- 
gical correlations. Homogeneous random networks are obtained by 
generating Erdos Renyi graphs 56 through wiring each couple of nodes 
with probability (k)/( V-l), were (k) is the prescribed average node 
degree, and degrees are distributed according to a Poisson distri- 
bution. Once the networks are constructed, all other demographi- 
cal and mobility quantities are also defined, based on the scaling 
relations assumed for the length of stay x k (Eq. (1)), the population 
Nfo the diffusion rate , where different diffusion scenarios can be 
explored by tuning the diffusion rate rescaling a. By fixing the 
average length of stay t, we explore values of the exponent y in the 
interval [ — 1, 0.4] to ensure the applicability of the time-scale 
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Figure 5 | Epidemic invasion trees. The cases of a positively-correlated (%=0A) and negatively-correlated (/= — 1) length of stay are shown. The 
synthetic network is characterized by a power-law distribution P(k) oc iT T with y=2.1, size V= 10" subpopulations and average population size JV= 10 3 . 
An SIR dynamics starting from the same seeding node (at the centre of each visualization) is simulated, with Rq= 1.8, o— 1CT", x = 37, and ;i=0.002 (in 
arbitrary time units) to ensure the timescale approximation. Only the first 120 nodes to be infected are displayed for the sake of visualization, on successive 
layers of invasion. Larger width grey links correspond to the paths of infection and lighter grey ones to the existing connections among visible nodes. 
Nodes are colour coded according to the time of their seeding, and their size scales with their degree; nodes in the first layer are ordered according to their 
degree to highlight the role of different degree nodes in the hierarchical invasion pattern in the two cases. 



separation approximation as well as feasible computational times. 
Starting from one seeded subpopulation, we simulate 500 stochastic 
realizations of the spatial epidemic simulation averaging on different 
initial conditions, and on different realizations of the metapopu- 
lation network that constitute the spatial structure of the popu- 
lation under study. For each simulation we can calculate the 
number of infectious individuals in time in the whole system and 
in each subpopulation, and evaluate the number of subpopulations 
reached by the outbreak as a function of time. 

Figure 4 shows the final size of the epidemic, expressed in terms of 
the simulated final fraction of infected subpopulations in the system, 
DJV, as a function of % and R 0 , for the heterogeneous network 
topology (panels a and b) and for the comparison between the het- 
erogeneous and the homogeneous case (panels c and d, respectively). 
The threshold behaviour is shown through the transition from the 
containment region (where DrJV=0) to the invasion region of the 
parameter space, where a finite fraction of the subpopulations of the 
system is reached by the infection. The role of the diffusion rate is 
shown in Figure 4a where different transitions are obtained as a 
function of x f° r the same value of the reproductive number. As 
expected from equation (5), at fixed values of x and R 0 , larger dif- 
fusion rates favor the spread and can bring the system above the 
threshold. The dependence of the invasion behaviour on x and Ro 
is reported in Figure 4b. For smaller values of the critical threshold 
in _R 0 increases; the smaller the value of x and the larger needs to be 
the transmission potential of the disease so that it can reach global 
proportion and spread throughout the metapopulation system. For 
X=0A, instead, only diseases very close to the epidemic threshold 
R 0 = 1 are contained at the source. Given the role of the diffusion rate 
in the invasion process, as shown in Figure 4a, we tested that the 
enhancement of the spreading at the system level due to an increase 
of the value of x is not related to an increase of the volume of people 
travelling, induced by the constraint of fixing the average length of 
stay across all numerical experiments. By imposing a constant t, the 
increase of % corresponds actually to a slight decrease of the average 
total traffic per link, i.e. the sum of people leaving and coming back to 



a given subpopulation in the quasi-stationary approximation, as illu- 
strated in the Methods section. The global invasion is therefore 
favoured by a more efficient spreading mechanism allowed by the 
presence of central nodes characterized by a large number of con- 
nections and large visiting times. Finally, we report on the good 
agreement between the individual-level simulations and the corres- 
ponding analytical results presented in the previous subsection, as 
shown by the vertical and color-coded arrows in the Figure indi- 
cating the analytical values of the transition. 

The effect of the heterogeneity of the metapopulation network 
structure on the global invasion threshold is further tested in 
Figure 4c and d where the simulation results obtained by comparing 
two different network topologies, heterogeneous vs. homogeneous 
structures with equal size and average degree, respectively, are 
shown, all other demographic, mobility, and disease parameters 
(-Ro = 1-2) kept equal. Global invasion is reached for a wide range of 
X and a values explored in the heterogeneous case, whereas the 
invasion occurs only for a considerably smaller set of values of x 
and a in the case of a homogeneous metapopulation network. 

Next to the transition behaviour, the effect of degree-correlated 
heterogeneous length of stay is also evident in the spatial propagation 
of the epidemic in the system above the invasion threshold. By focus- 
ing, in this analysis, on simulated epidemics starting from the same 
initial conditions, it is possible to build the epidemic invasion tree that 
represents the most probable transmission of the infection from one 
subpopulation to the other during the history of the epidemic 10 . The 
stochastic nature of the epidemic process implies that each realization 
will produce a different tree. An overall epidemic invasion network can 
be constructed by weighting each link of propagation from subpopula- 
tion i to subpopulation j with the probability of occurrence of the 
transmission along the mobility connection over the various stoch- 
astic realizations, and then the corresponding minimum spanning tree 
can be extracted to eliminate loops and focus on the main directed 
paths of transmission 10 (see the Methods section for further details). 

Figure 5 presents a visualization of the invasion trees obtained for a 
heterogeneous network where positively and negatively- correlated 
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Figure 6 | Threshold condition for global invasion for shorter infectious 
periods: simulation results, (a) Average final fraction of infected 
subpopulations DJV (symbols) and standard deviations (error bars) 
obtained from simulations as a function of the basic reproductive number 
Ro, for different SIR-like disease characterized by different infectious 
periods. Here we consider a diffusion rate rescaling <r = 4-10~ 5 and y= — 1. 
All other parameters are set as in Figure 4b. (b) Average final fraction of 
infected subpopulations DJV (symbols) and standard deviations (error 
bars) obtained from simulations as a function of the length of stay 
parameter for different values of the diffusion rate rescaling a. Here the 
same parameters of Figure 4a are used, except for a shorter infectious 
period, corresponding to ;i=0.05. 

lengths of stay are considered. A node is chosen at random and it is 
used as the starting condition of 100 stochastic realizations of the SIR 
epidemic in both cases. The resulting trees can be mapped out in 
terms of successive layers of infection from the origin, with nodes in 
the first layer ordered by degree (size of the dot) and by seeding time 
(color), showing how different values of % alter the hierarchy of the 
epidemic invasion from one subpopulation to another. For x > 0» 
largely connected subpopulations have a predominant role in the 
further spatial spread of the disease, thanks to the two-fold favouring 
property of having a large degree and a longer visiting time. A dif- 
ferent picture is obtained for j;<0, where the spreading potential of 
hub subpopulations, due to their high degree, is suppressed by the 
very short time duration that individuals spend there. More peri- 
pheral subpopulations become instead mainly responsible of the 
spreading dynamics towards the rest of the system. 

Role of the timescale assumption. The results presented in the 
above subsections are obtained in the relatively long infectious 



period assumption, i.e. ^ _1 2>Ty c , to ensure the applicability of the 
timescale separation technique and allow the comparison between 
analytical and numerical findings. In the simulations we assumed an 
infectious period /( _1 = 500 time steps expressed in arbitrary time 
units. The validity of such approximation clearly depends on 
the specific application and the spatial and temporal resolution 
considered. If we focus on human daily activities, based e.g. on 
the data obtained from cellular phones 3435 or high-resolution 
surveys 7,8,24 , realistic values of the mobility timescale of the system 
range from few minutes to few hours, translating the infectious 
period (JT 1 = 500 time steps in a duration of approximately few 
days, assuming the minimum timestep of the system to be equal to 
few minutes. The developed framework thus allows the study 
of infections such as influenza-like-illness spreading on a 
metapopulation network of locations visited during daily activities. 
A metapopulation model applied to the air travel mobility, often 
used for the study of the global spread of infectious diseases 5,10 ' 44 , 
would instead offer a different picture. Here the typical timescale 
of the system is of the order of days, thus an infectious period of 
^ _1 = 500 timesteps, as chosen in the simulations, would correspond 
to 500 days given the minimum timestep of the system being equal to 
1 day, and thus it would be much larger than the value corresponding 
to infectious diseases like influenza or SARS. In order to test the 
validity of our assumption on the length of the infectious period, 
we perform additional simulations by considering SIR-like diseases 
characterized by increasingly smaller pT 1 values, and evaluate the 
effect of the heterogeneous length of stay on the invasion dynamics. 

Figure 6 shows the results of the transition observed from the 
containment to the invasion region expressed in terms of the final 
fraction DJV of infected subpopulations, as a function of the repro- 
ductive number R 0 (panel a) and of the parameter % (panel b), and for 
different SIR-like diseases. We explored values of the infectious per- 
iod that range from /(~ 1 = 500 to [i~ 1= 4 timesteps, by keeping the 
same average length of stay % as in the previous results. The cases 
/i~ 1= 4 and ii 1 = 10, once expressed in days, correspond to the 
length of the infectious period of diseases as influenza and SARS, 
respectively, and therefore they allow us to explore the validity of the 
framework applied to a metapopulation model coupled by air travel 
mobility, where the timestep of the simulation is typically 1 day 5,10,44 . 

While a transition indicating the presence of an invasion threshold 
is recovered, the quantitative values of the threshold show a depend- 
ence on [i that is not found in the analytical expression of Eq. (5), 
obtained under the relatively long infectious period approximation. 
By changing pC 1 of two orders of magnitude, however, the relative 
variation observed in the R 0 threshold value is quite limited and 
approximately equal to 10%. In addition, the dependence of the 
global invasion threshold on the parameter x tuning the length of 
stay is maintained, as reported in Figure 6b for different values of the 
diffusion rates, and /.( = 0.05. While the analytical predictions are 
affected by the break-down of the timescale approximation, numer- 
ical results show the robustness of the threshold behaviour of the 
metapopulation system, indicating the applicability of the intro- 
duced theoretical framework to a wide range of geotemporal scales, 
and the importance of heterogeneous lengths of stay in the perspec- 
tive of epidemic control. 

Finally, we report in Figure 7 on the spatial invasion pattern by 
studying the epidemic invasion trees in the case of jT l = 20, corres- 
ponding to the SIR metapopulation dynamics of Figure 6b with 
o"=10~ 4 . Even in the absence of relatively long infectious periods, 
the resulting pattern show the critical role of % in shaping the hier- 
archy of the invasion and the spreading role of subpopulations in 
different degree classes. 

Discussion 

Heterogeneities have long been recognized to be important in the 
spreading of an infectious disease, both at the contact level between 
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Figure 7 | Epidemic invasion trees for shorter infectious periods. The cases of a positively-correlated (#=0.4) and negatively-correlated (y_ = — 1) length 
of stay are shown. The synthetic network is characterized by a power-law distribution P(k)cck~ r with y=2.1, size V= 10" subpopulations and average 
population size N = 10 3 . An SIR dynamics starting from the same seeding node (at the center of each visualization) is simulated, with R Q = 1.8, a= 10~ 4 , 
t = 37. All parameters are thus set equal to the ones in Figure 5, except for the shorter value of the infectious period considered, corresponding to p.= 0.05. 
As in Figure 5, only the first 120 nodes to be infected are displayed for the sake of visualization, on successive layers of invasion. Larger width grey links 
correspond to the paths of infection and lighter grey ones to the existing connections among visible nodes. Nodes are colour coded according to the time 
of their seeding, and their size scales with their degree; nodes in the first layer are ordered according to their degree to highlight the role of different degree 
nodes in the hierarchical invasion pattern in the two cases. 



individuals and at the connection level between subpopulations of 
individuals. Here, we have considered an extra layer of heterogeneity 
characterizing the host dynamics in terms of the duration of visits 
and found how large fluctuations in this quantity strongly alter the 
conditions for the disease invasion. Given the assumed dependence 
of the length of stay with the subpopulation degree, both the theor- 
etical framework and the applied numerical simulations have shown 
that two regimes are found that may dramatically favour or hinder 
the invasion, induced by the positive or negative degree-correlation, 
respectively, altering the predictions of simpler models. Despite its 
simplicity, the present framework uncovered an important aspect 
that must be considered in interpreting and simulating epidemic 
spreading patterns, and in providing detailed model predictions. 
As the spatial spread plays a crucial role in the management and 
control of a disease, the present results call for the need for higher 
resolution mobility data to better characterize the length of stay and 
include additional realistic aspects - such as, e.g., the heterogeneity of 
the travel behaviour of the population, the dependence of mobility 
rates on the distance travelled, and others - that may enable an 
increasingly accurate description of the disease propagation. 

Methods 

Mobility at the equilibrium. The rate equations describing the non-Markovian 
travelling dynamics (see SI for more details) can be written in terms of the variables 
N kk (t) and N kk (t) by adopting a degree-block notation that assumes statistical 
equivalence for subpopulations of equal degree k 27 

Wkk(t) = -a k N tk {t)+kl. k N kt (t)P{k\k)x^ 

S,N kk it) = a kk N kk (t) -N kk {t)x k - 1 

where o k — kL k G kk P{k \k) represents the overall leaving rate out of k, and the 
conditional probability P{k \k) represents the probability that a node with degree k is 
connected to a node with degree k' . The condition d t N kk (t) — d t N kk (f) — 0 yields the 
equilibrium solutions, equations (2) and (3). We first write the equilibrium relation 
Nkk' = Nkk&kk T k' > anQ then plug it into the expression for the total number of 



individuals resident in the subpopulation k, N k — N kk + kX k Nffl P{k fc) . By making 
explicit the variable N kk in the latter equation and considering that for uncorrelated 
networks the conditional probability is given by the expression P(k \k) — kP{k) /(k), 
we recover the expressions (2) and (3). The full derivation is reported in the SI. 

From equations (2) and (3) we can compute T kk , the total volume of people 
travelling along each link at the equilibrium, that is the sum of people resident in A: and 
travelling to their destination k' , and people resident in k! returning after visiting k 

T kk = j^(kk) (v k + v k ) 

T kk is function of all the parameters of the system, and in particular of the exponent 
X- Therefore, scenarios with different values of x differ in the value of the average total 
traffic, since we impose the same average length of stay t. For the case of uncorrelated 
network, the average total traffic volume { T kk ) is given by the expression 

where we have highlighted the dependency on y u . {T^} decreases as the value of % 
increases with a relative variation no greater than 2% considering the two cases %= — 1 
and x =0.4 (all other parameter values set as in the main text). 

Calculation of the global invasion threshold. In order to obtain the explicit form for 
equation (4) we plug into the equation the expression for the number of seeds 
^fcfe' =a {Nkk +Nfc'fc)> where N k ^ zndN k fc are given by the equilibrium relation (3). For 
the attack rate a we take the approximate expression valid under the condition 57 
2( r 

Rq = 1, y. = — ^ ■ ^ e ODtam in such a way the following expression for the 

equation (4) 0 

D" k = C[k" +l v k P(k)l. k D" k - 1 (k - l)k tt ^- + k"^P{k)l. k iy k ^(k - l)k\] (7) 

where C — - - , — — , , , ~ — -. At this point we can write a close form of the above 
iterative process by defining the vector 30 0" — f®",®")' wnere 

®'{ = Y. k (k-l)k° + *Dl 

®" 2 =Y. k {k-\)k°v k D" k 

The dynamics equations of 0" are encoded in equation (7) and can be written as 
0 M = CG0 M ~\ with G being the two-dimensional matrix with elements 
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gl2 = ((fc-l)^ +1 ) 

g2l = ((k-l)k»>+<vl) 
The dynamics is driven by the largest eigenvalue of G, which defines the function 
A({P(k)},a,rjf) = ((k-l)k 20+ x +1 v k ) + sj{(k-l)k 2 ( (, +x) + 1 }((k-l)k 20 + l v 2 k ) 
appearing in the equation (5) of the global invasion threshold parameter. 

Numerical simulations. We explicitly simulate both the travelling among different 
subpopulations and the infection transmission within each subpopulation as discrete- 
time stochastic processes, treating individuals as integer units. At each time step, 
travelling individuals along with new infectious and recovering individuals for each 
subpopulation are extracted randomly from binomial and multinomial distributions. 
More in detail, the system is updated according to the following rules, (a) Infection 
dynamics: (i) The contagion process assumes the homogenous mixing within each 
population, specifically at each time step the force of infection F k within a 

subpopulation of degree k, is given by F k — where I£ and N£ are respectively the 

total number of infectious and the total number of individuals present at that moment 

in the subpopulation, i.e. JJ =Iu c -\-k'L l :I l : k P(Jc\k) t and N£ = N kk + kl, k 'N 1( : k P(jc\ky, 
(ii) At the same time, each infectious individual is subject to the recovery process with 
rate \l . (b) After all nodes have been updated for the local infection process, we 
simulate the diffusion process by randomly extracting for all nodes the number of 
individuals departing to each of the k destination and the ones coming back. 

Epidemic invasion tree. The epidemic invasion tree is a directed, weighted minimum 
spanning tree among all the possible propagation paths starting from the same initial 
condition, extracted as follows. For each subpopulation pair Ij, we define pu as the 
probability of infection transmission from 1 to j. This probability shows the likelihood 
that subpopulation j 's infection is seeded by subpopulation /. This can happen by two 
means: either a resident in j acquires the infection in / and brings it home, or an 
infectious traveller from / brings the infection to j. Then, pn is defined as the 
proportion of runs, where j has been seeded by /. Finally, we define a distance metric 
dy = y/1 —pij to measure dissimilarities for the infection probability. The minimum 
spanning tree is then calculated using the Chu-Liu-Edmunds Algorithm 58 . 
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